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5 DISRUPTION OF THE KEXl GENE IN PICHIA AND 

METHODS OF FULL LENGTH PROTEIN EXPRESSION 

Statement of Governmental Rights 

This work was supported by an NIH grant. The U.S. 
10 Government may have certain rights in this invention. 

Related Applications 

This application claims priority to U.S. Provisional Patent 
Application Serial No. 60/103,414 filed October 7, 1998. 
Technical Field 

15 The invention relates to protein expression and more 

specifically relates to genetically modified yeasts belonging to the 
genus Pichia and to their use to produce recombinant proteins. 
Background of the Invention 

It is possible to modify microorganisms in order to make 

20 them produce proteins of interest such as, for example, mammalian 

proteins, artificial proteins, chimeric proteins, and the like. In 
particular, numerous genetic studies have been performed on the 
bacterium Escherichia coli and the yeast Saccharomyces cerevisiae. 
More recently, genetic tools have been developed so as to use the 

25 yeast Pichia pastoris as host cell for the production of recombinant 

proteins. 

Gram to kilogram quantities of proteins must be 
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recombinantly produced for clinical trials. If such trials show 
promising results, hundreds or thousands of kilograms are typically 
necessary to meet the demands of patients worldwide. 

Expression in Pichia offers many of the benefits of E. coli 
(high-level expression, easy scale-up, and inexpensive growth) 
combined with many of the advantages of expression in an 
eukaryotic system (protein processing, folding, and 
posttranslational modifications). 

P. pastoris is a methylotrophic yeast. In the absence of a 
repressing carbon source, such as glucose, P. pastoris is able to use 
a metabolic pathway that allows it to utilize methanol as a carbon 
source. The alcohol oxidase promoter (AOXl) controls expression 
of alcohol oxidase which catalyzes the first step in methanol 
metabolism. To overcome the low specific activity of the enzyme, 
large quantities of alcohol oxidase are produced. Typically greater 
than 30% of the total soluble protein in methanol-induced cells is 
alcohol oxidase. The AOXl promoter has been characterized and 
incorporated into a series of Pichia expression vectors in order to 
take advantage of the powerful AOXl promoter to drive high-level 
expression of recombinant proteins. 

P. pastoris is a useful candidate for expression of proteins in 
sufficient quantities (see the review in Romanos, M.A. et al. 
{\992)Yeast 8, 423-488). Protein yields of more than 1 gram per 
liter have been described (Tschopp, J.R et al. (1987) 5,1305-1308; 
Paifer, E. et al. {\99A)Yeast 10, 1415-1419; Laroche, Y. et al. 
(1994) Bio/Technology 12, 1119-1124; and Rodriguez, M. et al, 
(1994) 7. BiotechnoL 33, 135-146). Though every protein will 
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express at a different level, most proteins are expressed at higher 
levels in P. pastoris than in bacterial, insect, or mammalian 
systems. P. pastoris is easily adaptable to large-scale fermentation 
for the production of recombinant protein. Expression of 
5 recombinant proteins in P. pastoris has been carried out in 

fermentors as small as 1 liter and as large as 10,000 liters. 
Moreover, recombinant protein expression in P. pastoris is less 
expensive than expression in insect or mammalian systems. The 
growth and expression medium does not require expensive 
10 supplements such as fetal bovine serum. In addition, expensive 

equipment such as CO^ incubators and tissue culture hoods are not 
required. Because there is a long history, and therefore a wealth of 
accumulated knowledge, regarding the growth of yeast cells in large 
fermentors, such as is used in the brewing of beer or the production 
15 of penicillin, kilogram quantities of a protein can easily be purified. 

One of the disadvantages of yeast expression systems in 
general is that recombinant proteins are frequently degraded at the 
N- and/or C-terminus (Romanos, M.A. et al. {\992)Yeast 8, 423- 
488; Scorer, C.A. et al. (1993) Gene 136, 111-119; Clare, J.J. et al. 
20 (1991) Gene 105, 205-212; Heim, J. et al. (1994) Eur. J. Biochem. 

226, 341-353). This can cause problems in the functioning of the 
protein in some cases. For example, it has been shown that the C- 
terminal arginines of the anaphylatoxins C3a and C5a are necessary 
for full biological activity. Skidgel, R.A. {\m)Trends Pharmacol. 
25 Set. 9, 299-304. Some protease deficient P. pastoris strains are 

available (Ohi, H. et al. (1996) Y. Yeast 12, 31-40). 

What is needed in the art, therefore, are improved 



3 



wo 00/20610 PCT/US99/23351 

recombinant constructs and methods of producing proteins in P. 

pastoris that do not result in undesired cleaved protein products. 

Summary of the Invention 

The present invention provides methods and compositions for 
5 recombinant expression of full length proteins in P. pastoris. The 

invention provides genetic constructs containing a disruption in the 

KEXl gene that prevent cleavage of basic amino acids, such as 

lysine, from the carboxy terminal. 

Therefore, it is an object of the present invention to provide 
10 improved recombinant products expressed in the organism P. 

pastoris. 

It is an object of the present invention to provide recombinant 
protein products expressed in the organism P. pastoris that are not 
undesirably cleaved at the carboxy terminal. 
15 It is an object of the present invention to provide improved 

recombinant constructs containing a disruption in the KEXl gene of 
P. pastoris that prevents cleavage of basic amino acids, such as 
lysine, from the carboxy terminal. 

It is an object of the present invention to provide methods of 
20 use for improved recombinant constructs containing a disruption in 

the KEXl gene of P. pastoris that prevent cleavage of basic amino 
acids, such as lysine, from the carboxy terminal. 

It is also an object of the present invention to provide 
improved recombinant constructs encoding a basic amino acid, such 
25 as lysine, at the carboxy terminal for use in a vector comprising a 

disrupted KEXl gene of P. pastoris that prevent cleavage of basic 
amino acids, such as lysine, from the carboxy terminal. 
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These and other objects of the invention will become 
apparent to those skilled in the art upon a review of the detailed 
description and examples herein. 
Brief Description of the Drawings 

5 Figures lA and IB illustrate the sequence of the 

Kexlp enzyme from P, pastoris. Figure lA is the amino acid 
sequence of P. pastoris Kexlp(SEQ ID NO:l). Arrows indicate 
amino acid residues likely to be important for the catalytic activity 
of Kexlp. Filled triangles denote potential N-linked glycosylation 

10 sites. 

Figure IB is a comparison of the amino acid sequences of S. 
cerevisiae (top)(SEQ ID N0:2) and P. pastoris Kexlp (bottom) 
(SEQ ID N0:3). 

Figures 2A and 2B illustrates the genomic locus of the wild- 
15 type SMDl 168 and the mutant kexl::SUC2 strains. Figure 2A is a 

genomic map around the wild type KEXl gene. The numbers in 
parentheses correspond to the base pair number in the GenBank file 
(AF095574). 

Figure 2B is a genomic map of the kexl::SUC2 disruption 
20 strain (ol is oligonucleotides used for verification of successful 

disruption of the KEXl gene). 
Detailed Description of the Invention 

The present invention provides a recombinant nucleic acid 
construct comprising a disrupted KEXl gene of Pichia that 
25 prevents cleavage of one or more basic amino acids from the 

carboxy terminal of a protein expressed therewith. In preferred 
embodiments, the Pichia is a Pichia pastoris. In other preferred 
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embodiments, the basic amino acid is a lysine. In further preferred 
embodiments, the disruption is a nucleic acid deletion, insertion or 
addition within the KEXl gene. 

The present invention also provides a method of expressing a 

5 full length protein comprising transfecting a gene encoding the 

protein into a recombinant nucleic acid construct and promoting the 
expression of the gene, wherein the recombinant nucleic acid 
construct comprises a disrupted KEXl gene of Pichia that prevents 
cleavage of one or more basic amino acids from the carboxy 

10 terminal of a protein expressed therewith. In preferred 

embodiments, the Pichia is a Pichia pastoris. In other preferred 
embodiments, the basic amino acid is a lysine. In further preferred 
embodiments, the disruption is a nucleic acid deletion, insertion or 
addition within the KEXl gene. 

15 The invention also provides a method for expressing a full 

length protein comprising transfecting a gene encoding the protein 
into a recombinant nucleic acid construct and promoting the 
expression of the gene, wherein the recombinant nucleic acid 
construct comprises a disrupted KEXl gene of Pichia that prevents 

20 cleavage of one or more basic amino acids from the carboxy 

terminal of a protein expressed therewith, and wherein the gene has 
been modified to contain one or more basic amino acids at the 
carboxy terminal. In preferred embodiments, the Pichia is a Pichia 
pastoris. In other preferred embodiments, the basic amino acid is a 

25 lysine. In further preferred embodiments, the disruption is a nucleic 

acid deletion, insertion, susbstitution or addition within the KEXl 
gene. 

6 
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By the use of "a" or "an" herein is meant or more than one, 
depending upon the context. By "protein" is meant any naturally 
occurring or modified sequence of amino acids, i.e. a polypeptide of 
any length. Such a protein can be a peptide fragment of a naturally 

5 occurring or modified protein, in addition to chimeric, fusion or 

labelled proteins, for example. The invention provides that a basic 
amino acid can be a lysine, histadine, or arginine. Under some 
physiological conditions, basic amino acids can include asparagine 
and glutamine. Cleavage of such basic residues can be prevented 

10 by the present invention whether the basic amino acid is at or near 

the carboxy terminal of the protein of interest. The invention 
contemplates that more than one basic amino acid may be located at 
or near the carboxy terminal of a protein in order to facilitate 
expression of an increased amount of the full length of the desired 

15 protein. 

Accordingly, the present invention provides a KEXl 
disruption strain wherein the disruption of the KEXl reading frame 
allows expression of an increased amount of full length protein with 
the carboxy terminal intact. The KEXl disruption strain is useful 

20 for the full length expression of other proteins with a carboxy 

terminal, or carboxy terminal region, basic amino acid. Examples 
of protein that can be expressed with the present invention include, 
but are not limited to, endostatin^w protein, and the anaphylatoxins 
C3a and C5a. It has been shown that the carboxy terminal arginines 

25 of C3a and C5a are necessary for full biological activity. 

The 3.5 kb P. pastoris KEXl gene locus has been deposited 
in the GenBank database and is available under the Accession 
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Number AF095574. Any disraption of the KEXl gene that results 
in the production of full length protein as desired is contemplated 
by the present invention. One such example of a disruption 
involving a deletion and insertion is described below in the 
5 examples wherein a portion of the KEXl gene is deleted and 

replaced with a portion of a Sarcomyces cerevisiae SUC2 gene. 

The KEXl disruption constructs of the present invention are 
also useful for producing proteins having a basic amino acid 
purposefully added to the carboxy terminus in order to protect the 
10 carboxy terminus from degradation by other carboxypeptidases. 

The present invention provides for the inhibition of degradation of 
recombinant proteins by genetically constructing the protein with a 
carboxy terminal region lysine, or other basic amino acid(s), that is 
compatible with protein activity. 
15 In a related embodiment, the KEXl strain can be used to 

make epitope labeled proteins. In recent years, epitope tagging of 
proteins has become useful, because it allows the detection of 
proteins with highly specific, high-affinity antibodies within several 
weeks without the need to express a sufficient quantity of protein 
20 for injection into rabbits and without the need to depend on 

adequate antibody titers. The FLAG-tag (Asp-Tyr-Lys-Asp^- 
Lys)(SEQ ID N0:4) is among the frequently used amino acid 
sequences. Expression of secreted proteins tagged on the carboxy 
terminus with a FLAG-tag in P. pastoris results in the degradation 
25 of the carboxy terminal lysine and the protein tag will not be 

recognized anymore by the FLAG antibody. The KEXl disruption 
of the present invention avoids this degradation. 
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Therefore, the present invention provides methods and 
genetic constructs which increase the amount of full length protein 
expressed therewith. In some cases with expression of protein on 
the KEXl disrupted construct, a mixture of proteins is formed, 

5 some having intact carboxy terminii and some not intact. It appears 

that lysis of yeast cells occurs which releases proteases into the 
culture-supernatant. The invention provides a means to overcome 
this problem by using a double deletion strain, such as crossing the 
KEXl disruption strain with the PRCl deletion strain (Ohi, H. et al. 

10 (1996) Yeast 12, 31-40). Carboxy terminal degradation can also be 

minimized by a short methanol induction phase. In the KEXl 
deletion strain, degradation of the carboxy terminal lysine by other 
enzymes can be minimized using either shaker flask purification or 
short fermentation runs. Furthermore, intact proteins can easily be 

15 separated from degraded forms using cation exchange 

chromatography. Thus, the invention provides that the disrupted 
KEXl constructs allow for an increased amount of full length 
protein to be expressed therein. 

Furthermore, other embodiments of the invention provide 

20 that the cloning of the disrupted KEXl gene allows construction of 

an overexpression strain, which may be more efficient in the 
cleavage of carboxy terminal basic amino acids. This strain is 
useful during expression of proteins, where proteolytic activation 
and removal of carboxy terminal basic amino acids is important for 

25 the generation of active protein. A classical example of such a 

protein is insulin. Construction of a P. pastoris strain 
overexpressing the Kexlp, Kex2p and Stel3p proteins is very 

9 
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efficient in secreting active insulin. Thim at al. {Proc. Natl. Acad. 
Set 83, 6166-6110 (1986)) have reported that in S. cerevisiae the 
proper processing of insulin is a limiting factor. 

In the examples, the preferred SMDl 168 strain of P. pastoris 

5 was used. SMDl 138 is a pep4 mutant strain of P. pastoris deficient 

in proteinase A activity. Proteinase A is required for the proteolytic 
activation of a number of proteases, including carboxypeptidase Y. 
SMDl 168 can be used to reduce proteolysis of some recombinant 
proteins expressed in the strain. 

10 In the examples, the preferred pPIC9K Pichia expression 

vector, also available from Invitrogen, was used. The pPIC9K 
Pichia expression vector allows one to obtain Pichia strains that 
contain multiple copies of the gene of interest. Several publications 
suggest that increasing the number of copies of the gene of interest 

15 in a recombinant Pichia strain may increase protein expression 

levels. The pPIC9K vector carries the kanamycin resistance gene 
which confers resistance to G418 in Pichia. Spontaneous generation 
of multiple insertion events, which occur in Pichia at a frequency of 
1-10%, can be identified by the level of resistance to G418. Pichia 

20 transformants are selected on histidine deficient medium and 

screened for their level of resistance to G418. An increased level of 
resistance to 0418 indicates multiple copies of the kanamycin 
resistance gene and of the gene of interest. The pPIC9K vector 
allows secreted expression of the gene of interest. 

25 The pA0815 vector is specially designed to generate multiple 

copies of the gene of interest in a single vector. The vector contains 
a Bgl II site upstream of the 5' AOXl gene and a unique BamH I 
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site downstream of the 3' AOXl transcription termination (TT) 
signal. Four steps are performed to generate multiple copies of the 
gene of interest. First, the gene is cloned into the unique EcoR I 
site in the vector. Then, the construct is digested with BamH I and 

5 Bgl II to release the "expression cassette" containing the AOXl 

promoter, gene of interest, and 3' AOXl TT. Next, multiple copies 
of the expression cassette are generated in vitro by ligation. Finally, 
the multiple copies are inserted back into the pA0815 vector and 
transformed into Pichia. 

10 It should be understood that the compositions and methods 

disclosed and claimed herein are useful in other species and strains 
of Pichia and that other expression systems can be used. For 
example, Invitrogen also offers a Pichia methanolica expression 
system, to which the presently disclosed methods are applicable. 

15 The invention is further illustrated by the following 

examples, which are not to be construed in any way as imposing 
limitations upon the scope thereof. On the contrary, it is to be 
clearly understood that resort may be had to various other 
embodiments, modifications, and equivalents thereof, which, after 

20 reading the description herein, may suggest themselves to those 

skilled in the art without departing from the spirit of the present 
invention. 

Examples 

Endostatin'^'^ protein is a potent angiogenesis inhibitor. When 
25 endostatin^M protein is expressed in Pichia pastoris, analysis of the 

expressed protein by mass spectrometry indicates that the protein is 
truncated. N-terminal sequence analysis determined that the N- 
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terminus was intact, suggesting that the C-terminal lysine was 
missing. In Saccharomyces cerevisiae, Kexlp, a carboxypeptidase 
B-like enzyme, can cleave lysine and arginine residues from the C- 
terminus of peptides and proteins. It was discovered that the KEXl 
5 homologue in P. pastoris is responsible for the loss of the C- 

terminal lysine of endostatin^M protein. 

Example 1: Construction of a P. pastoris genomic library 
The polymerase chain reaction was performed according to 
standard protocols using either TAQ (Boehringer Mannheim) or 
10 PFU polymerase (Stratagene). All PGR fragments were cloned into 

the plasmid pCR™2.1, a TA-cloning vector (Invitrogen). PFU 
amplified fragments were isolated with the PCR-QIA Quick spin 
purification kit (Qiagen), incubated for 10 min with TAQ 
polymerase and ligated into the TA-vector. PFU polymerase was 
15 used for the amplification of the KEXl gene locus from genomic 

DNA of the P. pastoris strain SMD1168 and the SUC2 gene from 
the 5. cerevisiae strain S288C (ATCC 204508). Sequence analysis 
was performed on an ABI Prism Model 377, Version 3.0, in only 
one direction. 

20 For PGR analysis of the KEXl disruption strains, single 

colonies were inoculated into 1.5 ml YEPD medium and grown 
overnight to saturation. The cell pellet was resuspended in 200 M-1 
of lysis buffer (1% SDS, 2% Triton X-100, 100 mM NaGl, 10 mM 
Tris pH 8.0, 1 mM EDTA). After resuspension, 200 |.il of phenol- 

25 chloroform and 300 mg of acid-washed glass beads were added and 

the solution was vortexed for 2 minutes. Following a 5 min 
centrifugation step, the supernatant was ethanol precipitated and 
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1/50 was used for PGR analysis. 

DNA was isolated from the P. pastoris strain SMD1168 
using the Qiagen Genomic DNA Purification Kit. 2 |ag of genomic 
DNA were digested for 4 h with 30 Units EcoRl. The digested 
DNA was repurified with the QIAEX II kit and 100 ng were ligated 
into the Lambda ZAP II vector according to the protocol supplied 
by the manufacmrer (Stratagene). 3 |il of the ligation mix were 
used for packaging the vector into phage particles (Gigapack 111 
Gold packaging kit, Stratagene). Eleven million primary clones 
were obtained and 200,000 clones were amplified. The Qiagen 
Lambda DNA isolation protocol was used to obtain phage DNA 
from agarose plate lysates. 200 ng of phage DNA were used for 
PGR analysis. 

Example 2: Expression of endostatin™ protein from P. 
pastoris 

The murine and human endostatin™ protein cDNAs were 
amplified using PGR and cloned into the Xhol/Notl sites of the 
expressing vector pPIG9 (pPIG9-mES and pPIG9-hES, respectively, 
Invitrogen). This cloning created a fusion protein of the alpha-factor 
secretion signal with endostatin. Murine endostatin™ protein 
should be secreted starting with the published N-terminus His-Thr- 
His-Gln-Glu-Phe and human endostatin'^^ protein should start with 
His-Ser-His-Arg-Glu-Phe, if correct processing by the Kex2p and 
the Stel3p occurs. These constructs were verified by sequencing. A 
1.6 kb Bamm/Xbal fragment of pPIG9-mES and pPIG9-hES was 
subcloned into pPIG9K. 

The plasmid was transfected into the SMD1168 strain. This 
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Strain has a deletion in the PEP4 gene, which is important for the 
activation of carboxypeptidase Y (CPYp). Although the CPYp is 
not the only carboxypeptidase in yeast cells, the SMD1168 strain 
should have a lower capacity to degrade proteins from the C- 
terminus compared to the wild type P. pastoris strain GST115. 
Integration of the murine construct into SMD1168, followed by 
selection on -His and G-418 plates were performed according to the 
manufacturer's instructions {Pichia Expression Kit, Invitrogen). 

Example 3: Purification of endostatin™ protein from P. 
pastoris 

Murine endostatin^M protein was purified from yeast 
supernatant grown in shaker flasks. For shaker flask purification, 
the recommendations of the manufacturer were followed 
(Invitrogen). Rich medium buffered with potassium phosphate at 
pH 6.0 was used. The cultures were induced with methanol for 48- 
72 h. During the induction period the pH increased to 6.5. Cells 
were removed by centrifugation and the supernatant was slowly 
mixed with a saturated ammonium sulfate solution to a final 
concentration of 45% anamonium sulfate. The anmionium sulfate 
solution was prepared by dissolving 767 g of ammonium sulfate 
(ICN) in 1 L of 10 mM Tris, pH 7.0. The pH was readjusted to pH 
7.0 and the solution stored at 4° C. The ammonium 
sulfate/supernatant solution was stirred for 2 h at 4° C and 
centrifuged at 5000 rpm for 20 min. The precipitated proteins were 
resuspended in 0.1-0.2 starting volumes of phosphate buffered 
saline (PBS). The solution was dialyzed against PBS and directly 
applied to a heparin-Sepharose column. After washing the column 
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With PBS and 0.25 M NaCl, 20 mM Tris, pH 7.4, endostatin™ 
protein was eluted using a 0.25 to 1.5 NaCl gradient. Fractions 
containing endostatinTw protein were pooled, dialyzed against PBS 
and filter sterilized. The protein was stored at -20° C. 
5 Murine and human endostatiniw protein were also purified 

from yeast culture-supernatant harvested from a fermenter (5 liter 
bench-top B. Braun fermenter). Complex medium was used and the 
fermentation conditions recommended by Invitrogen were followed. 
The pH was kept constant at 6.5. The pH of the complex medium 
10 was kept constant at 6.5, because it has been shown that 

endostatinTM protein is a zinc-binding protein. In addition, zinc- 
binding may be necessary for full biological activity and it protects 
the N-terminal amino acids from proteases and therefore 
endostatin^M protein from inactivation (Boehm et al., submitted). 
15 Growing the yeast cells below pH 6.5 may cause dissociation of 

zinc from the binding pocket and may allow proteases to cleave off 
some N-terminal amino acids, which would result in the 
purification of inactive protein. Atomic absorption of endostatinTw 
protein isolated from shaker flasks and a fermenter showed a 1:1 
20 ratio of zinc to protein. 

The methanol induction phase was either 40 h or 70 h. The 
supernatant was dialyzed against PBS and concentrated. 
Ammonium sulfate precipitation was not possible and the solution 
was directly applied onto a heparin-Sepharose column. After 
25 washing the column with PBS and 0.25 M NaCl, 20 mM Tris, pH 

7.4, endostatin^M protein was eluted using a 0.25 to 1.5 M NaCl 
gradient. Fractions containing endostatin^M protein were pooled and 
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dialyzed against PBS. A second chromatography step was 
necessary to purify endostatin™ protein to apparent homogeneity. 
A HiPrep 26/60 Sephacryl 3-200 HR gel filtration column 
(Pharmacia) was used, equilibrated in PBS. Endostatin™ protein 
5 containing fractions were filter sterilized and stored at -20° C at a 

protein concentration of 0.5-2.0 mg/ml, determined using the Bio- 
Rad Bradford assay with immunoglobulin as the standard protein. 

Endostatin™ protein was purified to apparent homogeneity 
from the shaker flasks as shown by SDS PAGE. N-terminal 
10 sequence analysis showed the correct N-terminus but mass 

spectrometry revealed that one amino acid is missing from the C- 
terminus. The molecular weight determined by mass spectrometry 
was 20248.4 Da, which is in good agreement with the calculated 
molecular weight of endostatin™ protein without a C-terminal 
15 lysine of 20243.79. The "real" calculated molecular weight of 

endostatin™ protein without the C-terminal lysine would be 
20247.79 but 4 Da have been subtracted because two disulfide 
bonds are formed. Free cysteines in endostatin™ protein purified 
from the P. pastoris culture medium were not found. The formation 
20 of two disulfide bonds was also confirmed by the two recently 

published structures of mouse and human endostatin™ protein 
(Hohenester, E. et al., (1998) EMBO J. 17, 1656-1664; Ding, H-Y. 
et al. (1998)?roc. Natl Acad. Sci. 95, 10443-10448). Mass 
spectrometry analysis also demonstrated that endostatin™ protein is 
25 not glycosylated when expressed in P. pastoris. 

Expression of endostatin™ protein in a fermenter required 
the use of a heparin-Sepharose and a gel-filtration step to purify it 
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to apparent homogeneity. Under the high cell density growth 
conditions during fermentation a fraction of endostatin-^w protein 
was further degraded. Mass spectrometry analysis indicated that the 
C-terminal serine exposed by removal of the lysine was removed. 
5 The calculated molecular weight of endostatin™ protein without a 

C-terminal lysine and serine and without 4 Da due to the formed 
disulfide bonds is 20156.7 Da. 

Example 4: Cloning of the KEXl gene 
In 5. cerevisiae, the KEXl gene encodes a carboxypeptidase 
10 B-like enzyme, which is involved in the processing of the killer 

toxin and the alpha-factor (Dmochowska, A. et al. (1987) Cell 50, 
573-584). The Kex2p and the Stel3p cleave off the alpha-factor 
signal sequence, which allows secretion of alpha-factor or the 
protein of interest into the culture medium. Because the alpha-factor 
15 secretion signal was used for expression of endostatin™ protein, it 

was hypothesized that the Kexlp homologue in P. pastoris removed 
the C-terminal lysine. The P. pastoris KEXl gene was cloned and 
disrupted in order to test this hypothesis. 

A PGR approach was used to clone the KEXl gene. The S. 
20 cerevisiae Kexlp and the CPYp from S. cerevisiae and P. pastoris 

contain a highly conserved motif around the active serine residue, 
Gly-Glu-Gly-Ser-Tyr-Ala-Gly (SEQ ID N0:5). Degenerative 
oligonucleotides of this motif were designed and as a second primer 
T3 from the vector sequence was used. 200 ng of phage library 
25 DNA were used as a template. The following degenerate 

oligonucleotide combined with a T3 primer from the lambda vector 
(pBluescript) amplified a 1.8 kb fragment at 62" C annealing 
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temperature, 5'-TG NCC CGC RTA RCT YTC ACC-3'(SEQ ID 
N0:6). Sequence analysis followed by a BLAST database search 
showed the highest homology to the S. cerevisiae KEXl gene. The 
T7 primer from pBluescript and a reverse primer obtained from the 

5 sequence of the 1.8 kb PGR fragment were used to obtain a 4.2 kb 

fragment 3' of the Gly-Glu-Gly-Ser-Tyr-Ala-Gly (SEQ ID N0:5) 
motif. A total of 3.5 kb were sequenced. Sequence and protein 
analysis were done with the Wisconsin Package Version 9.0, 
Genetics Computer Group (GCG), Madison, Wise. In the 

10 KEXl gene there are two ATG codons in a row. The second ATG 

was used as the first amino acid codon. The reading frame encodes 
for 623 amino acids with a predicted molecular weight of 70,017.26 
Da (Fig. lA). The S. cerevisiae KEXl gene encodes for 729 amino 
acids. The difference of 106 amino acids is mainly due to a 

15 shortened C-terminus. The P. pastoris Kexlp is 75 amino acids 

shorter on the C-terminus compared to the S. cerevisiae homologue. 
Amino acid comparison shows that these two proteins are only 36% 
identical and 43.7% similar (Fig. IB). The N- and C-terminal 
regions are especially weakly conserved. Comparison of the two 

20 proteins without the first 80 and last 160 amino acids of the P. 

pastoris Kexlp increased the identity to 48% and the similarity to 
57%. Nevertheless, residues belonging to the catalytic triad of the S. 
cerevisiae Kexlp (Serl98, Asp406 and His470) are conserved 
(Shilton, B.H. et al. (1997) Biochemistry 36, 9002-9012). In 

25 addition, the amino acid residues close to these three residues are 

highly conserved. The glutamic acids (GlullO and Glul97 in the S. 
cerevisiae Kexlp) that are important for the hydrolysis of peptides 
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are also conserved in the P. pastoris Ke\lp (Glu99 and Glul76, 
Stennicke, H.R. et al. (1996) J. Biochemistry 35, 7131 7141). 
Furthermore, the hydropathy profile determined according to 
Engelman, D.M. et al. (1986) Annu. Rev. Biophys. Chem. 15, 321- 

5 353 and Kyte, J. et al. (1982) /. Mol Biol 157, 105-132 showed a 

very similar profile indicating structural conservation (not shown). 

A significant difference between these two sequences is that 
the P. pastoris Kexlp does not have the very distinct 105 residue 
stretch rich of aspartic and glutamic acids found in the S. cerevisiae 

10 sequence (Fig. IB). Nevertheless, the P. pastoris Kexlp has a short 

hydrophilic-acidic region upstream of the conserved membrane 
spanning domain. This transmembrane domain might target the 
Kexlp to the late Golgi compartment, as has been shown for the S. 
cerevisiae Kexlp (Cooper, A. et al. (1989) Mol. Cell. Biol. 9, 2106- 

15 2714; Bryant, N.J. et al. (1993) /. Cell Science 106, 815-822). A 

second hydrophobic region is located at the N-terminus and may 
serve as a signal sequence (Watson, M.E.E. (1984) Nucl. Acids Res. 
12,5145-5156). 

The P. pastoris Kexlp has six potential N-linked 

20 glycosylation sites (AspXxxSer/Thr) compared to four in the S. 

cerevisiae homologue (Fig. lA). The two sites between amino acid 
449-469 in the S. cerevisiae Kexlp and 430-440 in P. pastoris 
Kexlp are conserved (Fig. IB). 

Example 5: Disruption of the KEXl gene with SUC2 

25 In the next step major parts of the reading frame were 

disrupted and the question of whether endostatin^M protein could be 
purified with the C-terminal lysine was investigated. In order to 
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disrupt the KEXl gene, a strategy was followed using the SUC2 
gene from S. cerevisiae as a selectable marker (Figs. 2A and 2B). 
The SUC2 locus encodes the invertase gene, which should allow P. 
pastoris cells to grow efficiently on minimal sucrose plates. P. 
5 pastoris cells grow slowly on minimal sucrose (MS) plates. 

A 2.2 kb fragment of the KEXl locus including the ATG and 
the stop codon was amplified from SMD1168 genomic DNA (Fig. 
2B) and cloned into the TA-cloning vector pCR™2.1. The 
oligonucleotides contained a 5' Smal and 3' SnaBl restriction site. 
10 The resulting plasmid (pCR™2.1-KEXl) was amplified in the 

methylation deficient strain SCSI 10 (Stratagene) and cut with 
Cla\/Nde\. A 2.9 kb segment of the SUC2 gene was PGR amplified 
from 5288C genomic DNA with Clal and Nde\ flanking 
restriction sites and cloned into the pCR™2.1-KEXl plasmid. The 
15 disruption plasmid (pCR™2.1-kexl::SUC2) was digested with 

SmaMSnaBh transformed into SMD1168 using the Frozen-EZ 
Yeast Transformation II kit (Zymo Research) and the yeast cells 
were plated on minimum sucrose plates prepared as described by 
Ohi, H. et al. (1996) Yeast 12, 31-40. Positive colonies were 
20 restreaked on MS plates and analyzed by PGR. Successful 

disruption of the KEXl gene was detected using three different 

PGR primer pairs. 

One of the SMD1168 kexl::SUG2 strains was transformed 
with pPIG9K-mES and pPlG9K-hES and selected on -HIS and 
25 consequently on G-41 8 plates. The endostatin™ protein expression 

vectors were transformed into the KEXl deletion strain. 
Expression and purification of murine and human endostatinTw 
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protein was performed essentially as described above. 
Example 6: Mass spectrometry analysis 
Recombinant endostatin^^^ protein was assayed on a Voyager 
DE-Elite or Voyager DE-STR Biospectrometry^^ workstation 
5 (Perceptive Biosystems). Molecular mass analysis was performed 

on 2 |Lil purified protein in PBS cocrystallized with 2 ^1 of a 10 
mg/ml sinapinic acid/myoglobin solution (3,5-dimethoxy-4- 
hydroxy-cinnamic acid in 1:2 v/v acetonitrile/water, 0.1 % TFA and 
200 ng/ml horse myoglobin (Sigma) as internal standard). Under 
10 these conditions the error of the mass determination is below 0.1%. 

In the case of endostatin, this gives an expected error of less than 20 
Da. The determined masses deviate by less than 10 Da from the 
calculated values. Therefore, single amino acid changes can easily 
be detected. 

15 Mass spectrometry analysis of murine endostatin^M protein 

showed a molecular weight of 20378.7 Da, in agreement with the 
calculated weight of 20371.79. Four Da have been subtracted due 
to the formed disulfide bonds. This result shows that disruption of 
the KEXl gene in P. pastoris allows expression of full length 

20 endostatinTM protein. Fermentation for 40 h resulted in the 

purification of intact endostatin™ protein. However, after a 70 h 
fermentation run mass spectrometry analysis of purified 
endostatin^M protein revealed that it was again degraded. A signal 
con-esponding to endostatinTM protein without the C-terminal lysine 

25 appeared as well as a small peak without both the C-terminal lysine 

and serine. 
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Example 7: Cation exchange chromatography 

The isoelectric point of intact murine endostatin''"'^ protein is 
8.93. Removal of the C-terminal lysine shifts the isoelectric point to 
8.54. This pH difference was exploited to separate intact from 
degraded endostatin™ protein using cation exchange 
chromatography. A MonoS HR 5/5 column (Pharmacia) was used 
to separate intact endostatin™ protein and endostatin™ protein 
without the C-terminal lysine. 0.1 to 1 mg of purified endostatinT^ 
protein in PBS was loaded onto the column using the FPLC system 
from Pharmacia. The column was washed with 50 mM HEPES, pH 
8.0 and endostatin™ protein was eluted with a 0 to 0.35 M NaCl 
gradient. 

Two protein peaks eluted from the column. The first peak 
(approximately eluting at 180 mM NaCl) corresponds to 
endostatin™ protein without a C-terminal lysine and to a certain 
extent without serine, whereas the second peak (approximately 
eluting at 220 mM NaCl) was intact endostatin™ protein as 
determined by mass spectrometry. The percentage ratio of intact to 
degraded endostatin'^'^ protein was about 52 to 48. 
Human endostatin^M protein was also expressed in the KEXl gene 
deletion strain. Most of the protein showed a molecular weight 
close to the calculated mass of 20091.5 Da (4 Da have been 
subtracted). There was also a signal corresponding to human 
endostatin™ protein without a C-terminal lysine. The isoelectric 
point of human endostatin™ protein drops from 9.49 to 8.93 
without the C-terminal lysine. The degraded form could be 
separated from the intact form on a MonoS column. 87% of human 



22 



wo 00/20610 PCT/US99/23351 

endostatin™ protein was full length. The calculated mass of human 
endostatin'T^ protein without the lysine is 19663.4 Da. 

The above description is intended to be illustrative and not 
restrictive. Many embodiments will be apparent to those of skill in 

5 the art upon reading the above description. The scope of the 

invention should, therefore, be determined not with reference to the 
above description, but should instead be determined with reference 
to the appended claims, along with the full scope of equivalents to 
which such claims are entitled. The disclosures of all articles and 

10 references referred to herein, including patents, patent applications, 

and publications, are incorporated herein by reference. 
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What is claimed is: 

1 . A recombinant nucleic acid construct comprising a disrupted 
KEXl gene of Pichia that prevents cleavage of one or more 
basic amino acids from the carboxy terminal of a protein 

5 expressed therewith. 

2. The construct of Claim 1, wherein the Pichia is a Pichia 

pas tor is. 

3. The construct of Claim 1, wherein the basic amino acid is a 
lysine, arginine or histadine. 

10 4. The construct of Claim 1, wherein the basic amino acid is a 

lysine. 

5. The construct of Claim 1, wherein the disruption is a nucleic 
acid deletion and insertion within the KEXl gene. 

6. The construct of Claim 1, wherein the disruption is a nucleic 

15 acid deletion within a portion of the KEXl gene, and 

replacement thereof with a portion of Sarcomyces cerevisiae 
SUC2 gene. 

7. A method for expressing a full length protein comprising 
transfecting a gene of interest into a recombinant nucleic acid 

20 construct and promoting the expression of the gene of interest, 

wherein the recombinant nucleic acid construct comprises a 
disrupted KEXl gene of Pichia that prevents cleavage of one or 
more basic amino acids from the carboxy terminal of a protein 
expressed therewith. 

25 8. The method of Claim 7, wherein the Pichia is a Pichia 

pastoris. 
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9. The method of Claim 7, wherein the basic amino acid is a 
lysine, arginine or histadine. 

10. The method of Claim 7, wherein the basic amino acid is a 
lysine. 

11. The method of Claim 7, wherein the disruption is a nucleic 
acid deletion and insertion within the KEXl gene. 

12. The method of Claim 7, wherein the disruption is a nucleic 
acid deletion within a portion of the KEXl gene, and replacement 
thereof with a portion of Sarcomyces cerevisiae SUC2 gene. 

13. A method for expressing a full length protein comprising 
transfecting a gene of interest into a recombinant nucleic acid 
construct and promoting the expression of the gene encoding the 
protein, wherein the recombinant nucleic acid construct comprises a 
disrupted KEXl gene of Pichia that prevents cleavage of one or 
more basic amino acids from the carboxy terminal of a protein 
expressed therewith, and wherein the gene has been modified to 
contain one or more basic amino acids at the carboxy terminal of 
the protein. 

14. The method of Claim 13, wherein the Pichia is a Pichia 
pastoris. 

15. The method of Claim 13, wherein the basic amino acid is a 
lysine, arginine or histadine. 

16. The method of Claim 13, wherein the basic amino acid is a 
lysine. 

17. The method of Claim 13, wherein the disruption is a nucleic 
acid deletion and insertion within the KEXl gene. 
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18. The method of Claim 13, wherein the disruption is a nucleic 
acid deletion within a portion of the KEXl gene, and replacement 
thereof with a portion of Sarcomyces cerevisiae SUC2 gene. 

19. A protein expressed by the method of Claim 7. 

20. A protein expressed by the method of Claim 13. 
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1 MLKLLCLLLP LVAVSASPID LGSQKDYLVL DLPGLSHLSE TQRPTMHAGL 
51 LPLNLSFVAD DDTEYFFWRF SKQDVDRADl VFWLNGGPGC SSMDGAllAEL 
101 GPFVINPKQE VEYNEGTWVE AADWFVDQP GGTGFSSTTN YLTELTEVAD 
151 GFVTFLARYF HLFPADVYKK FTLGGE|YAG QYVPYILKA. MDDLKSDSGQ 
201 LPKELYU<GA LIGNGWIDPN EQSLSYLEFF IKKEUDIHG SYMPGLLQQQ 
251 EKCQNLINHS SGEASESQIS YSACEKILND ALRFTRDKKA PLDQQCINMY 
301 dytlrdty'^s CGMSWPPYLP DVTAFLQKKS VLEALHLDSS ASWSECSARV 
351 GSHLKNKISV PSVDILPDLL QEIPllLFNG DHnUCNClG TERMIDKLEF 
401 NGDQGFTBGT EY.PWFYNEV NVGKVISERN LTYVRWNSS «MVPFD^^rPV 
451 SRGLLDIYFD NFEDVEYNNV SG^TPVYDV DKNTTYIDSN DPRLQNGPKS 
501 SSTDDSAAHG NPFFYYVFEL FVIVLLLCGL VYLYQYYSNS APHS.LADKH 
551 KKKSKNKSKN VRFLDDLESN LDLDNTDDKK DNSVMSKLLS SMGYQAQEPY 
601 KPLDKGANAD LDIEMDSHGT SEK* 
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